AI: streaming-safe middleware, agent-driven Runner, ModelBuilder tool binding#139
Open
bedus-creation wants to merge 16 commits into
Open
AI: streaming-safe middleware, agent-driven Runner, ModelBuilder tool binding#139bedus-creation wants to merge 16 commits into
bedus-creation wants to merge 16 commits into
Conversation
…I intact Swap the per-provider SDK internals of ai/agent.py (_run/_stream over the anthropic/openai/google SDKs) for a single LangChain/LangGraph backend: init_chat_model builds the chat model and create_agent drives the tool loop, with the final AIMessage mapped back to AgentResponse. The user-facing surface is unchanged — prompt/stream/fake/assert_prompted/assert_not_prompted/reset, the lifecycle hooks, and the decorators keep identical signatures. - _build_model() is the seam tests patch to inject a fake chat model. - _build_messages() now renders attachments via Document.to_langchain_block(). - Add Document.to_langchain_block(): inline text, base64 image/file blocks. - Add ai/fakes.py fake_chat_model(): replays scripted AIMessage turns through a GenericFakeChatModel (bind_tools no-op) so the real create_agent loop runs offline; exported from the ai package root. - New optional [langgraph] extra (langchain + langchain-core + langgraph). The 23 tests in tests/ai/test_agent_fake.py stay green and unmodified (fake() short-circuits before the backend). Adds tests/ai/test_agent_langgraph_backend.py exercising the real loop offline: simple reply, full tool-calling loop, usage mapping, attachment blocks, provider mapping, and streaming.
beedb37 to
990e842
Compare
…ent through it Turn ai/config.py into a config package and resolve models/providers through a new Lab helper instead of hardcoded dicts on the Agent. - ai/config/: split provider dataclasses (config.py) from the top-level AIConfig (ai.py), add config/__init__.py re-exporting them, and give each provider a models map keyed by modality (default / default_image / default_audio / default_transcribe). Fix the draft's circular import (AIConfig imported the provider configs from the package root mid-init) and the placeholder model values (google text default, elevenlabs models). - AIConfig selects the default provider per modality: default (text), default_image, default_audio, default_transcribe. image.py/audio.py now read default_image/default_audio (was image_provider/audio_provider). - ai/lab.py: Lab(StrEnum) + ModelType resolve the provider, default model, and the "<langchain-provider>:<model>" URL from Config (google → google_genai). - Agent: _resolve_model() and _build_model() now go through Lab; removed the stale _DEFAULT_MODELS/_LANGCHAIN_PROVIDERS references and the dead _execute_tool() (create_agent runs tools itself). - Tests: drop the _build_model monkeypatch helper; the backend tests now patch the real langchain.chat_models.init_chat_model seam via pytest monkeypatch. Add test_lab.py; update image/audio provider-selection mocks. The 23 tests in tests/ai/test_agent_fake.py stay green and unmodified.
…l loop
Replace the LangGraph create_agent backend with a plain init_chat_model call
driven by a Runner that resolves and executes tool calls itself.
- runner.py: Runner(model, tools, max_steps) binds tools, invokes the model,
executes requested tool calls, feeds results back, loops to a final answer;
StreamRunner yields content tokens through the same loop. Fully typed.
- Agent._run/_stream delegate to Runner/StreamRunner (threading _max_steps);
no create_agent.
- System message is declarative via instructions()/_instructions — removed the
per-call system= and messages= arguments from prompt()/stream().
- Resolve provider/model through Lab directly (dropped the _lab() helper) and
import Lab at module top.
- Config split into ai/config/{ai,config}.py.
KNOWN RED: ai/config/__init__.py is absent, so 'from fastapi_startkit.ai.config
import AIConfig' fails — the AI test suite does not collect and AIProvider import
breaks. Backend tests also still assume the old create_agent result shape and the
tuple return of _build_messages. Follow-ups.
0278c05 to
e31432b
Compare
Add a class-level testing harness so an agent can be faked or recorded
without a real model provider:
- Agent.fake({...}) and Agent.record(path) return an AgentBinding usable
as a context manager or test decorator. The binding swaps a stand-in
into the service container under the agent's class name and auto-resets
on exit, so even a controller's own ChatAgent().prompt(...) is covered.
- FakeAgent answers from glob patterns; RecordingAgent records the real
reply to JSON once, then replays it. Both expose assert_prompted().
- Agent.make()/faked() resolve the bound stand-in for assertions.
- prompt()/stream() delegate to an active binding; the in-process
agent.fake({...}) instance API is preserved via a dual-purpose accessor.
- Lab.ModelType carries the models-map key as a static mapping.
- Import AIConfig from the fastapi_startkit.ai namespace in tests and the
AI facade stub, matching the provider registration.
Tests: full suite 1541 passed, 7 skipped; ruff clean.
e31432b to
dbfc847
Compare
Convert bare assert statements to self.assertEqual in the example/agents feature tests, matching the unittest.IsolatedAsyncioTestCase base. Keep the async test methods and the @ChatAgent.fake / @ChatAgent.record decorators intact.
…sertions test(agents): use unittest assertions in example feature tests
…Lab default fallback - Replace the FakeAccessor descriptor with a single Agent.fake() classmethod; drop the unused faked()/bound aliases and rename the internal stand-in resolver to _faked(), sharing container lookup via _binding(). - Runner.run() now returns the tool result directly instead of looping the output back to the model (custom single-shot tool semantics). - Resolve provider via Lab.get_provider(self._provider) so a None provider falls back to the configured default instead of raising. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Convert the AI tests from pytest function style to unittest.TestCase classes (fake/record, decorators, lab, config, provider, document, response, agent). Rename test_agent_langgraph_backend.py to test_agent.py, add a TestAgentRecord class for record-and-replay, and align expectations with the actual config default provider (google). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ckend AIProvider.register() now resolves AIConfig and merges it into the config store (merge_config_from) instead of binding into the container and setting it in boot(); boot() is a no-op. Drop the unused _memory_backend class attribute on Agent. Update provider tests to the new behaviour. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Agent.prompt() and stream() are now coroutines/async generators, matching the framework's async-first design. The Runner uses ainvoke/astream/ainvoke for tools, the fake/record stand-ins and AgentSnapshot.resolve are async, and the fake/record/agent tests run under IsolatedAsyncioTestCase. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drop the bundled provider SDKs (anthropic, openai, google-generativeai) and the unused langgraph from the [ai] extra and dev group — providers are pulled lazily by init_chat_model and are now opt-in. Fix stale langgraph references in the fake-model helper to point at the [ai] extra. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…n ModelBuilder Streaming previously buffered the entire response when an agent had middleware: `final = await handler(model)` drained the Response stream before returning, so the first token only appeared after the full generation and after-hooks fired late. build_pipeline now hands each middleware a Response-returning handler so layers can attach `.then(callback)` and return without awaiting — streaming-safe, and the after-hook fires once on completion (buffered for prompt, post-stream for stream). Middleware may be sync or async; the example AgentLogger is now sync + `.then()`. Other AI changes: - ModelBuilder binds tools (agent.tools()) onto the chat model; Runner no longer binds, only keeps the tool map for execution. - Runner takes the agent (Runner(agent, model)) instead of threading tools/max_steps separately. - Agent.tools() typed as list[BaseTool]; @model decorator sets `model` (was the unread `_model`). - Tests updated to the current API (instructions() method, provider attr, ModelBuilder._resolve_model) and drop the removed `memory` decorator; add regression tests for streaming-through-middleware and the prompt after-hook. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add direct tests for ai/pipeline.py — the Response deferred-callback mechanism and build_pipeline onion that back agent middleware. Locks down that the .then after-hook fires exactly once on both the buffered (await) and streaming (async for) consumption paths, modelled on the AgentLogger request/response logging pattern. Brings pipeline.py to 100% coverage.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Hardens the AI agent harness: middleware no longer breaks streaming, tool binding/execution have one clear owner each, and the
Runneris driven by the agent. Tests are realigned to the current API.Streaming through middleware (the main fix)
With middleware attached,
.stream()used to buffer the whole response. A middleware written the natural way —final = await handler(model)— drains the entireResponsestream before returning, so:Fix:
build_pipelinenow hands every middleware aResponse-returning handler, so a layer can attach.then(callback)and return it without awaiting. Awaiting still works forprompt()(buffered), but streaming stays token-by-token and the after-hook fires exactly once on completion (post-stream forstream(), on the final message forprompt()).build_pipelinechecksisinstance(_, Response)before the awaitable check, sinceResponseis itself awaitable (otherwise a synchandlereturning aResponsewould be re-buffered).AgentLoggerconverted to sync +.then().Tool binding / execution ownership
ModelBuilderbindsagent.tools()onto the chat model (what the LLM needs to emit tool calls).Runnerno longer binds — it keeps thename → BaseToolmap purely to execute returned tool calls.bind_toolsstores serialized schemas, not callables, so execution still needs the real tools.Runner takes the agent
Runner(agent, model)/StreamRunner(agent, model)instead of threadingtools/max_stepsthrough every call site.modelstays a separate arg because the middleware pipeline can transform it.Typing & API cleanup
Agent.tools()typed aslist[BaseTool](lazyTYPE_CHECKINGimport) — fixes thetool.name/dict[str, BaseTool]type errors.@modeldecorator now setsmodel(it previously set an unread_model, making the decorator a silent no-op).Tests
test_agent.py/test_agent_decorators.pyto the current API:instructions()method,providerattribute,ModelBuilder._resolve_model; removed the deletedmemorydecorator.Verification
pytest tests/ai/→ 191 passed.🤖 Generated with Claude Code